Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GPU exection of atm_divergence_damping_3d via OpenACC #1237

Merged

Conversation

abishekg7
Copy link
Collaborator

This PR enables the GPU execution of atm_divergence_damping_3d subroutine.

Tested with a real and idealized test cases.

@mgduda mgduda requested review from mgduda and gdicker1 October 19, 2024 00:26
@mgduda mgduda added Atmosphere OpenACC Work related to OpenACC acceleration of code labels Oct 19, 2024
Copy link
Collaborator

@gdicker1 gdicker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tested this today for the merged branch (develop+atmosphere/port_divergence_damping_3d) versus the base branch (develop), the results matched!

However, I have some comments to address:

@@ -2599,6 +2599,13 @@ subroutine atm_divergence_damping_3d( state, diag, mesh, configs, dts, edgeStart
rdts = 1.0_RKIND / dts
coef_divdamp = 2.0_RKIND * smdiv * config_len_disp * rdts

MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]')
!$acc enter data copyin(ru_p, rtheta_pp, rtheta_pp_old, specZoneMaskEdge, &
!$acc theta_m, cellsOnEdge)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specZoneMaskEdge and cellsOnEdge are invariant fields (the won't change after init). I believe your PR should make sure these variables are handled in atm_dynamics_init and shouldn't copyin or delete these fields in atm_divergence_damping_3d.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the convention we've been following so far? I assumed the routine to be ported has localized data movement clauses. I might also need to rework my other PRs if this is the case. Thanks for pointing it out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to follow up.. Talked to @mgduda, and got this cleared up. Will make changes to these invariant fields. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the convention we've been following so far?

I've been following that pattern, it was set in #1176.

I've been doing things similarly to what may happen here - do all the data movement that is needed locally for the function being ported, and then separate out those invariant fields and handle them in the atm_dynamics_{init,finalize} routines. Typically these invariant fields are part of the mesh var_struct as described in the src/core_atmosphere/Registry.xml.

!$acc theta_m, cellsOnEdge)
MPAS_ACC_TIMER_STOP('atm_divergence_damping_3d [ACC_data_xfer]')

!$acc parallel async
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async could be dangerous here, I don't believe acc exit data copyout(ru_p) will wait for this kernel to finish before beginning to copy.

I think it would be best to remove this async since we don't need it (yet) and it could create issues for the kernels that come next and depend on ru_p

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove the async. thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this async, but wanted to note that there's another one in atm_advance_scalars_work. Is that fine for now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this async, but wanted to note that there's another one in atm_advance_scalars_work. Is that fine for now?

I think we're good to leave that async clause for now, it has a matching wait directive/clause later on.

@abishekg7
Copy link
Collaborator Author

I've addressed the review comments, and also checked the output restart file for bit-for-bit reproducibility.

Copy link
Collaborator

@gdicker1 gdicker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the fixes and my testing1, this PR has my approval.

Footnotes

  1. Using the regional test case for a few timesteps and comparing the restart, diag, and history output files. The baseline and "feature" branch tests matched!


MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]')
!$acc exit data copyout(ru_p) delete(rtheta_pp, rtheta_pp_old, theta_m)
!$acc end data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this !$acc end data clause needed? (That's an actual, not a rhetorical, question!)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed because of the acc data present, which is currently written as a structured data region. Structured regions in Fortran OpenACC (parallel and data) require an acc end xxx statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gdicker1 Thanks for that explanation! The !$acc data present directive didn't register when I was scanning through the code changes.

@abishekg7 Perhaps it would be better to follow the pattern in other routines that have been ported, and to avoid the use of structured data directives for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to add default(present) clauses to the !$acc parallel directives, that would obviate the need for the !$acc data present(specZoneMaskEdge, cellsOnEdge) directive I would think.

!$acc enter data copyin(ru_p, rtheta_pp, rtheta_pp_old, theta_m)
MPAS_ACC_TIMER_STOP('atm_divergence_damping_3d [ACC_data_xfer]')

!$acc parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than the !$acc data present directive above on line 2611, it might be simpler and more robust to just add a default(present) clause to this parallel directive.

@gdicker1 Does this align with your understanding of best practices?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best practices would be neutral on this, whatever works best for our code and development. However, I agree that default(present) could be simpler and require less of us.

We would have to ask Pranay or other MPAS v7 OpenACC developers why they have similar function-level acc data present ... directives in their port. Speculation here: I think enclosing the kernels in a structured data region that ensures all data is present prevents the kernels from each checking their data - an optimization concern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding the default(present) for now, but noting the potential concern re. performance.

!$acc end parallel

MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]')
!$acc exit data copyout(ru_p) delete(rtheta_pp, rtheta_pp_old, theta_m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of clarity, I would suggest either having separate directives for copyout and delete, or at least splitting the copyout and delete clauses onto separate lines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mgduda mgduda self-requested a review January 15, 2025 02:06
@mgduda
Copy link
Contributor

mgduda commented Jan 15, 2025

@abishekg7 The code changes look good to me! I'd only suggest reworking the commit history; then I think we'll be all set.

This commit enables the GPU execution of the atm_divergence_damping_3d
subroutine using OpenACC directives for the data movements and loops. A new
timer, 'atm_divergence_damping_3d [ACC_data_xfer]', has been added for host-
GPU data transfers in this subroutine. This port follows these considerations:

- The data transfers for the invariant fields have been moved to
  mpas_atm_dynamics_init and finalize.
- Explicitly adding gang, worker and vector level parallelism to parallel
  constructs in atm_compute_vert_imp_coefs_work.
- The parallel constructs use default(present) clauses to avoid implicit
  data movements by the compiler. This change also requires dereferencing
  the pointers to nCellsSolve and nVertLevels, in order to avoid runtime
  error messages such as the following:

FATAL ERROR: data in PRESENT clause was not found on device 1:name=ncellssolve
@abishekg7 abishekg7 force-pushed the atmosphere/port_divergence_damping_3d branch from 3b82420 to 7db74f8 Compare January 17, 2025 01:46
@abishekg7
Copy link
Collaborator Author

Cleaned up the commit history. 1 commit seemed sufficient. Also, tested the rebased version in limited area mode and restart files are still bit identical.

@gdicker1
Copy link
Collaborator

@abishekg7 one change, and I think this is good.

  • The data transfers for the invariant fields have been moved to
    mpas_atm_dynamics_init and finalize.

Could you change "have been moved"? It implies such data movement was in the ...damping_3d routine before this commit.

@mgduda mgduda merged commit 37aa961 into MPAS-Dev:develop Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Atmosphere OpenACC Work related to OpenACC acceleration of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants